Nine Features in a Random Forest to Learn Taxonomical Semantic Relations

نویسندگان

  • Enrico Santus
  • Alessandro Lenci
  • Tin-Shing Chiu
  • Qin Lu
  • Chu-Ren Huang
چکیده

ROOT9 is a supervised system for the classification of hypernyms, co-hyponyms and random words that is derived from the already introduced ROOT13 (Santus et al., 2016). It relies on a Random Forest algorithm and nine unsupervised corpus-based features. We evaluate it with a 10-fold cross validation on 9,600 pairs, equally distributed among the three classes and involving several Parts-Of-Speech (i.e. adjectives, nouns and verbs). When all the classes are present, ROOT9 achieves an F1 score of 90.7%, against a baseline of 57.2% (vector cosine). When the classification is binary, ROOT9 achieves the following results against the baseline: hypernyms-co-hyponyms 95.7% vs. 69.8%, hypernyms-random 91.8% vs. 64.1% and co-hyponyms-random 97.8% vs. 79.4%. In order to compare the performance with the state-of-the-art, we have also evaluated ROOT9 in subsets of the Weeds et al. (2014) datasets, proving that it is in fact competitive. Finally, we investigated whether the system learns the semantic relation or it simply learns the prototypical hypernyms, as claimed by Levy et al. (2015). The second possibility seems to be the most likely, even though ROOT9 can be trained on negative examples (i.e., switched hypernyms) to drastically reduce this bias.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

VHR Semantic Labeling by Random Forest Classification and Fusion of Spectral and Spatial Features on Google Earth Engine

Semantic labeling is an active field in remote sensing applications. Although handling high detailed objects in Very High Resolution (VHR) optical image and VHR Digital Surface Model (DSM) is a challenging task, it can improve the accuracy of semantic labeling methods. In this paper, a semantic labeling method is proposed by fusion of optical and normalized DSM data. Spectral and spatial featur...

متن کامل

Integrating Semantic Relatedness and Words' Intrinsic Features for Keyword Extraction

Keyword extraction attracts much attention for its significant role in various natural language processing tasks. While some existing methods for keyword extraction have considered using single type of semantic relatedness between words or inherent attributes of words, almost all of them ignore two important issues: 1) how to fuse multiple types of semantic relations between words into a unifor...

متن کامل

Author gender identification from text using Bayesian Random Forest

Nowadays high usage of users from virtual environments and their connection via social networks like Facebook, Instagram, and Twitter shows the necessity of finding out shared subjects in this environment more than before. There are several applications that benefit from reliable methods for inferring age and gender of users in social media. Such applications exist across a wide area of fields,...

متن کامل

برچسب‌زنی نقش معنایی جملات فارسی با رویکرد یادگیری مبتنی بر حافظه

Abstract Extracting semantic roles is one of the major steps in representing text meaning. It refers to finding the semantic relations between a predicate and syntactic constituents in a sentence. In this paper we present a semantic role labeling system for Persian, using memory-based learning model and standard features. Our proposed system implements a two-phase architecture to first identify...

متن کامل

Investigating the mechanism of the void's physical-semantic effect on social interactions

The depth of the void concept has extended the range of its effects from philosophy to various sciences and even types of art. In architecture, due to the importance of the spacing effect and architectural components on behavior, void finds a different role that seems to be less addressed in contemporary architecture. If void, regardless of its hidden meaning, is referred to as "empty space," a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1603.08702  شماره 

صفحات  -

تاریخ انتشار 2016